中国邮电高校学报(英文) ›› 2023, Vol. 30 ›› Issue (5): 42-50.doi: 10. 19682 / j. cnki. 1005-8885. 2023. 0009

所属专题: Special Topic on Digital Human

• Special Topic : Digital Human • 上一篇    下一篇

Reliable pseudo-labeling prediction framework for new event type induction

杨琪1,徐雅静1,吕远2,肖波1,陈光1   

  1. 北京邮电大学
  • 收稿日期:2022-10-19 修回日期:2023-03-06 出版日期:2023-10-31 发布日期:2023-10-30
  • 通讯作者: 徐雅静 E-mail:xyj@bupt.edu.cn
  • 基金资助:
    the National Natural Science Foundation of China (62076031).

Reliable pseudo-labeling prediction framework for new event type induction

  1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2022-10-19 Revised:2023-03-06 Online:2023-10-31 Published:2023-10-30
  • Contact: Ya-Jing XU E-mail:xyj@bupt.edu.cn
  • Supported by:
    the National Natural Science Foundation of China (62076031).

摘要:

   As a subtask of open domain event extraction ( ODEE), new event type induction aims to discover a set of unseen event types from a given corpus. Existing methods mostly adopt semi-supervised or unsupervised learning to achieve the goal, which uses complex and different objective functions for labeled and unlabeled data respectively. In order to unify and simplify objective functions, a reliable pseudo-labeling prediction (RPP) framework for new event type induction was proposed. The framework introduces a double label reassignment ( DLR) strategy for unlabeled data based on swap-prediction. DLR strategy can alleviate the model degeneration caused by swap-predication and further combine the real distribution over unseen event types to produce more reliable pseudo labels for unlabeled data. The generated reliable pseudo labels help the overall model be optimized by a unified and simple objective. Experiments show that RPP framework outperforms the state-of-the-art on the benchmark.

关键词: open domain, event type induction, pseudo label, unified objective, swap-predication

Abstract:

   As a subtask of open domain event extraction ( ODEE), new event type induction aims to discover a set of unseen event types from a given corpus. Existing methods mostly adopt semi-supervised or unsupervised learning to achieve the goal, which uses complex and different objective functions for labeled and unlabeled data respectively. In order to unify and simplify objective functions, a reliable pseudo-labeling prediction (RPP) framework for new event type induction was proposed. The framework introduces a double label reassignment ( DLR) strategy for unlabeled data based on swap-prediction. DLR strategy can alleviate the model degeneration caused by swap-predication and further combine the real distribution over unseen event types to produce more reliable pseudo labels for unlabeled data. The generated reliable pseudo labels help the overall model be optimized by a unified and simple objective. Experiments show that RPP framework outperforms the state-of-the-art on the benchmark.

Key words: open domain, event type induction, pseudo label, unified objective, swap-predication